SMM281

Parameterisation

Method of Moments

Recall that the method of moments involves calculating sample statistics such as the mean and variance of an actual distribution and then fitting this to a known analytic distribution

Example

You have a set of data with a mean of 23 and a standard deviation of 12. You believe the data comes from a lognormal distribution: $lnN(\mu, \sigma^2)$. calculate $\mu$ and $\sigma$ using the method of moments

Fitting the Lognormal Distribution to Price Data

Price data is not a simple set of values from a distribution.

There are a number of issues we need to be careful about:

We would naturally assume it to be geometric
The distribution should describe the price changes not the price
We will typically have values on M,T,W,T and F but not at the weekends

How do we solve these problems

First we need to take logs of our data
Then we take differences

We now have data we would expect to fit a normal distribution

We still have to solve the weekend problem

We could

Make the assumption that the weekend is a three day period and hence multiply both the mean and variance of our distribution by 3
Make the assumption that as market's are closed we can treat Friday to Monday as one day

Both assumptions are assumptions and it makes more sense to actually test the validity of either using the data that we have

Essentially we are dealing with a composite process which combines a one day process with a three day process

By viewing the spreadsheet: Fitting volatility to weekly data.xls we observe a very interesting phenomenon

SDE interpretation of the 'log difference' method

We believe our data comes from the SDE: $\frac{dS_t}{S_t}=\mu dt + \sigma dW_t$

The solution of this is: $S_t = S_0 e^{(\mu-\frac{\sigma^2}{2})t + \sigma W_t}$

Or in other words: $S_{t+\Delta t}= S_t e^{(\mu-\frac{\sigma^2}{2})\Delta t + \sigma W_{\Delta t}}$

$\therefore ln(S_{t+\Delta t}) - ln(S_t) = (\mu-\frac{\sigma^2}{2})\Delta t + \sigma W_{\Delta t}$

And so $ var( ln(S_{t+\Delta t} - ln(S_t)) = \sigma^2 \Delta t$

Cholesky Decomposition

The Cholesky decomposition allows us to take a set of known correlations between random variables and then construct a matrix which will allow us to recreate these correlations from a set of independent random variables.

2 Random Variables

In the case of two random variables we know that $X$ and $Y$ are random variables drawn from $N(0,1)$ and have a correlation of $\rho$.

If we are now given a new random variable $X' \sim N(0,1)$ such that $corr(X, X')=0$ then how do we derive $Y$ such that $corr(X,Y)=\rho$

In other words what is the matrix $\begin{bmatrix} a & b \\ c & d\end{bmatrix}$ such that $\begin{bmatrix} a & b \\ c & d \end{bmatrix} \begin{bmatrix} X \\ X' \end{bmatrix} = \begin{bmatrix} X \\ Y \end{bmatrix}$

We have $X=aX + bX'$ and $Y=cX + dX' \therefore a=1, b=0$

So $cov(X,Y) = a.c. var(X) + d.b. var(X') + (a.d + b.c) cov (X, X')$

$\therefore cov(X,Y) = a.c + d.b = c$

$\therefore c = \rho$

Given $var (Y) = 1$ we have that $var (cX + dX') = 1$

$\therefore c^2 Var (X) + d^2 var(X') = 1$

$\therefore \rho^2 + d^2 = 1$

$\therefore d=\sqrt{1-\rho^2}$

So our Cholesky decomposition is:

$\begin{bmatrix} X \\ Y \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ \rho & \sqrt{1-\rho^2} \end{bmatrix} \begin{bmatrix} X \\ X' \end{bmatrix} $

What is $L \times L^T$

If we now multiply our matrix by its transpose we observe the following:

$\begin{bmatrix} 1 & 0 \\ \rho & \sqrt{1-\rho^2} \end{bmatrix} \times \begin{bmatrix} 1 & \rho \\ 0 & \sqrt{1-\rho^2} \end{bmatrix} = \begin{bmatrix} 1 & \rho \\ \rho & 1 \end{bmatrix}$

We notice that we have reconstructed the correlation matrix of the original relationship between the random variables $X$ and $Y$

Generalising the triangular matrix

We can extend this logic to more than 2 random variables

For example for three random variables the Cholesky decomposition is:

$\begin{bmatrix} 1 & \rho_{1,2} & \rho_{1,3} \\ \rho_{1,2} &1& \rho_{2,3} \\ \rho_{1,3} & \rho_{2,3}&1 \end{bmatrix} = \begin{bmatrix} t_{1,1} & 0 & 0 \\ t_{1,2} & t_{2,2} & 0 \\ t_{1,3} & t_{2,3} & t_{3,3} \end{bmatrix} \times \begin{bmatrix} t_{1,1} & t_{1,2} & t_{1,3} \\ 0 & t_{2,2} & t_{2,3} \\0 &0 & t_{3,3} \end{bmatrix}$

Solving gives us:

$\begin{bmatrix} t_{1,1} & 0 & 0 \\ t_{1,2} & t_{2,2} & 0 \\ t_{1,3} & t_{2,3} & t_{3,3} \end{bmatrix} = \begin{bmatrix} 1&0&0 \\ \rho_{1,2} & \sqrt{1-\rho_{1,2}^2}&0\\ \rho_{1,3} & \frac{\rho_{2,3} - \rho_{1,2} \rho_{1,3}}{\sqrt{1-\rho_{1,2}^2}}&\sqrt{1- \rho_{1,3}^2 - \frac{(\rho_{2,3} - \rho_{1,2} \rho_{1,3})^2}{1-\rho_{1,2}^2}} \end{bmatrix}$

This may look complicated but if this system of 9 equations in 9 unknowns reduces naturally one equation at a time and and is very easy to solve.

A numerical algorithm is easy to implement

Example

A correlation matrix between $A$, $B$ and $C$ is given by $\begin{bmatrix} 1 & 0.6 & 0.4 \\ 0.6 & 1 & 0.5 \\ 0.4 & 0.5 & 1 \end{bmatrix}$

Show how you would generate the random varaibles $A$, $B$ and $C$ from uniform independent random variables $U_1$ , $U_2$ and $U_3$

Least Squares Regression

Introduction

Curve Fitting and Least Squares Regression

If you wish to fit a function of several parameters to a set of data - this can be a very computationally intensive task

If there is only one parameter then using goal seek in Excel will probably be perfectly serviceable, but you cannot use this method if you have multiple parameters

First you need to decide what is meant by best fit. For most purposes minimising the square of the deviation between the actual data and the fitted data will be a sensible metric, hence the expression "Least squares Regression"

Then we need to decide how to search the vector space of all possible parameter combinations for the one for which the sum of the squares of the deviations is the least

We list below a number of methods in order of increasing sophistication.

Simple grid search

We assume:

there are $n$ parameters
the sum least squares function is denoted by $f$
the current set of parameters is stored in a vector denoted by $P$
Calculate $f$ at every point in a $n$ dimensional grid and then select the point in the grid for which $f$ is least. This is the set of parameters $P$ which gives the best fit

Orthogonal Grid Search

As per the simple grid search but:

Minimise for parameter 1, keeping the other parameters constant, then do the same for parameter 2 and so on, starting again with parameter 1 and repeating until the function cannot be reduced by changing any further parameters

Steepest Decent Method

Calculate the gradient vector at $P_0$ and then minimise in the steepest direction. Once the minimum has been found in this direction recalculate the gradient and repeat the process until you reach a minimum in all directions

Methods 2 and 3 both require us to be able to minimise along an individual line within our hyperspace

Line minimisation

Which ever method we use (apart from the grid) we will need to minimise our function along any particular line in the vector space of the parameters

Line minimisation is analogous to Newton-Rhapson root finding although we are not solving $f(x)=0$ but rather we are solving $f'(x)=0$ i.e. the point which the function (least squares deviation) is at a minimum

For Newton Rhapson the assumption we made was of local linearity - so this time the assumption we make is that the function is locally quadratic

We can use the first and second derivatives if we have them (a vector and matrix respectively) but if not a simple numerical procedure will suffice

calculate $f_0=f(\textbf{P}), f_1+f(\textbf{P}+h\textbf{x}), f_2=f(\textbf{P}+2h\textbf{x})$, where $\textbf{x}$ is the unit vector in parameter space of the direction in which we are minimising and $h$ is a small increment

Then $f''_0 = \frac{(f_2-f_1)-(f_1-f_0)}{h^2}$ is the rate at which the (negative) gradient is getting shallower and so given $f'_0 = \frac{(f_1-f_0)}{h}$ is the current gradient in the direction $\textbf{x}$ then our initial line minimisation guess is $P_{n+1} = P_n - \textbf{x} \times \frac{f'_0}{f''_0}$

We then repeat this process along the same line until we have found our one dimensional minimum - at which point we continue with whatever multi-dimensional method we were using

Convergence Speed

On first inspection the second and third methods would appear to be quite good methods, but in practice they both tend to zig zag very slowly towards a solution and a much more efficient method is found by the method of conjugate gradients

Powell's Conjugate Gradient Method

Conjugate Gradient Method

The conjugate gradient method is as follows:

Step 1: Initialise the set of directions $\textbf{u}_i$ to the basis vectors $\textbf{e}_i$

Step 2: Save your starting position as $\textbf{P}_0$

Step 3: For $i=1,...,N$, move $\textbf{P}_{i-1}$ to the minimum along direction $\textbf{u}_i$ and call this point $\textbf{P}_i$

Step 4: For $i=1,...,N$, set $\textbf{u}_i$ to be $\textbf{u}_{i+1}$

Step 5: Set $\textbf{u}_N$ to be $\textbf{P}_N - \textbf{P}_0$

Step 6: Move $\textbf{P}_N$ to the minimum along the direction $\textbf{u}_N$ and call this point $\textbf{P}_0$

Step 7: Repeat this basic algorithm $N$ times and any quadratic form will be minimised

The result in step 7 was proved by Powell in 1964

This spreadsheet Powell.xls contains an implementation of the method